home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
8bitfiles.net/archives
/
archives.tar
/
archives
/
compuserve-file-archive
/
10 Education Games
/
STATIS.DOC
< prev
next >
Wrap
Text File
|
2019-04-13
|
20KB
|
489 lines
MINISTAT: A Statistical Package for the Commodore 64
Copyright 1989 by Jon Rich, Ph.D.
MINISTAT is a statistical package program which performs both univariate and
bivariate inferential and descriptive statistics. A particularly useful
feature of this package is that the data need to be entered only once.
Once the data file has been set up, one may perform any of the included
statistical tests on any of the variables.
A MINISTAT data file is a two dimensional array, or table, of data. One
dimension is the variables. These may be subject characteristics, such as
sex or race, subject measurements, such as height, test scores, or running
speed, or any other characteristic on which subjects vary. The other
dimension is the cases, or subject number. Data for a typical MINISTAT
file is shown below:
Variables
SEX RACE HT. WT1 WT2
1 1 1 68 143 140
Case 2 2 2 60 105 103
No. 3 1 3 69 162 153
4 1 2 70 168 160
5 2 3 63 115 118
6 2 1 65 123 125
7 1 2 69 149 147
8 2 2 67 145 140
9 1 3 67 123 119
10 2 1 64 122 114
These data are from a test of a weight-loss diet. For each of the ten
persons in the test, the researcher has recorded the sex (1=male, 2=female),
the race (1=Black, 2=White, 3=Oriental), the height in inches, and the
weight before (WT1) and after (WT2) the diet. Using MINISTAT, we can
answer a number of questions about these data.
STARTING THE PROGRAM
Start the program by entering
LOAD "MINISTAT",8
and then entering RUN. At the title screen you will be given
the opportunity to toggle the color between black on white
and white on blue by pressing the space bar. Choose the color
combination which is easiest to read, and then proceed to the main
menu by pressing "C". You will then see the main menu, which looks
like this:
SELECT
A) SAVE 1) DESC
B) INFO 2) FREQ
C) OLD 3) REGR
D) DIR 4) CHI2
E) NEW 5) T:UR
F) KILL 6) T:RS
G) COMP 7) ALPHA
H) HELP
At this point, there is no data file loaded into the program. The only
options on the menu which will work are "C", which will allow you to
retrieve a previously saved file, "D", which will display the catalog of
previously saved MINISTAT files, "E", which will allow you to input a new
file, "F", which will erase a previously saved file, and "H", which will
allow you to view the help files.
SETTING UP THE FILE (Option "E")
To set up a new file, hit "E" at the main menu. You will be asked for
a file name. This can be anything you wish that you can easily associate
with your study. We will name this file "DIET." Next you are asked
"N VARS?" This means "How many variables are in the file?" In our
example there are five variables, so we enter the number 5. We
are then asked for N, and we enter 10, meaning there are ten subjects
in the study. N must be from 2 to 100, and the number of variables
must be from one to 30.
Next, MINISTAT asks, NAMES (y or n)?. This means, "Would you like to name
the variables?" If we press N, indicating No, MINISTAT will assign the
variables the names V1, V2, etc., and go straight into the data entry
section. If we press Y, we will be given an opportunity to assign
our own names. For our example, we will press Y. MINISTAT then asks
NAME1? and we enter SEX, the name of our first variable. After NAME2
we input RACE, after NAME3 HT., and so on.
Once the file characteristics have been input, we are ready to input the
actual data. MINISTAT will ask for the first case of the first variable,
continuing down through every case of the first variable, and then go on
to subsequent variables. For example, MINISTAT will initially print
SEX - CASE 1?, and we will enter a 1, indicating that the sex of the
first subject is male. If we make a mistake, we can back up by simply
pressing ENTER. The data entry might look like this:
SEX -- CASE 1: 1
SEX -- CASE 2: 2
SEX -- CASE 3: 3 (this value is a mistake)
SEX -- CASE 4: (enter, we back up)
SEX -- CASE 3: 1
SEX -- CASE 4: 1
*
*
*
WT2 -- CASE 9: 119
WT2 -- CASE 10:114
SELECTING A PROCEDURE
After the data have been entered, the other options in the menu become
available. Letters (A through H) select utility procedures; numbers
(1 through 7) select statistical procedures. A procedure is selected
by simply pressing the corresponding number or letter -- you do not
need to press enter.
Procedures that require that a variable be selected will produce a prompt
mark: >?. This mark indicates that a variable name should be entered.
Some procedures require that more than one variable be entered and will
produce this mark again until all variables have been entered.
If you input an unrecognized name, two question marks will be
printed.
After a procedure has been executed, you will be asked, if appropriate,
AGAIN (y or n)?. If you would like to perform the same procedure with
different variables or parameters, type Y. If you want to return to the
main menu, type N. Detailed descriptions of each procedure are listed
below.
1) DESC
This procedure generates descriptive statistics for any of the variables.
If we enter variable WT1, the description looks like this:
MEAN: 135.5 VAR: 393.25
S.D.: 19.830532 S.E.: 6.27096
SUM: 1355 N: 10
MAX: 168 MIN: 105
Here is what each of these statistics means:
N: The total number of subjects in the sample.
SUM: The sum of all the scores or measurements.
MEAN: This is the average value, the sum divided by N.
MAX, MIN: The maximum and minimum. The heaviest person in this sample
weighed 168 lbs., the lightest 105 lbs.
VAR: This is the variance of the sample -- to what degree the scores are
spread out or clustered together.
S.D.: The standard deviation, which is the square root of the variance.
In large samples, about 68% of the scores will fall within one standard
deviation of the mean, 95% within two standard deviations.
S.E.: This is the standard error of means, which is the standard deviation
divided by the square root of N. This is the standard deviation of the means
of all possible samples of size N.
2) FREQ
This procedure generates a histogram or bargraph. It shows how many subjects
fall within each of a number of consecutive values or value ranges of a
variable. The program first asks for a value name, and then for an
interval size. Choose an interval size which is a fraction of the
total range, but at least equal to the unit of measurement. Using an
interval size of 2, height is distributed like this:
60 ******* (1)
62 ******* (1)
64 ************** (2)
66 ************** (2)
68 ********************* (3)
70 ******* (1)
The top bar shows that there is one subject who is at least 60 inches but is
shorter than 62 inches. We can see that the modal interval, the one with
the most subjects, is the one with subjects who are at least 68 inches tall,
but shorter than 70 inches.
3) REGR
This procedure generates a scattergram, a regression equation, a
correlation coefficient, and a t-value with associated degrees
of freedom. All of these statistics allow us to examine the relationship
between two variables.
The scattergram is a plot of the values of one variable against the values
of another. A strong positive relationship, as one might expect to find
between variables such as height and weight or job prestige and income, will
show all of the points tightly clustered in a straight line going from
the lower left to the upper right. A weak relationship, such as that
between nose length and IQ, would show points scattered about in a more
or less random fashion. A strongly negative relationship, as one might
find between blood alcohol levels and performance on a driving test,
would show points clustered tightly from the upper left down to the
lower right. The first variable entered is the X variable, shown along
the bottom of the graph. The second variable entered is the criterion
or Y variable, and is shown along the side.
The regression equation is shown below the scattergram. This is the formula
which does the best job of predicting the Y variable from the X variable.
The correlation coefficient (R) quantifies the degree of relationship between
the two variables. The value of R can range from -1, a perfect negative
relationship, through zero, no relationship, to +1, a perfect positive
relationship. The t-value along with the degrees of freedom allows one
to test if the relationship is strong enough to be generalized beyond
the sample to the population in general. The P value shows the
level of significance for the t-value, that is, the likelihood
that the results are due only to chance and do not reflect a
real effect. A P of less than .05 is
generally thought of as
significant.
If we enter height (HT.) as our first variable, and weight before the
diet as our second variable (WT1), we get these results:
WT1 = (6.094*HT.) + -267.906
R=0.92
T=6.631 DF=8 P<.001
The regression formula provides a way to predict weight, given a person's
height. If someone is five feet, or 60 inches tall, we could predict that
they would weigh (6.094*60)-267.906, or 97.7 pounds.
The R of .92 is relatively high; it shows us that the relationship is
strongly positive, and that we can predict one variable from the other
with relatively little error.
The t, df, and p values can tell us whether the R is high enough to be
generalized to the population from which we drew our sample, or whether
it might be a fluke found in this particular sample. P<.001 means that
there is less than one chance in 1000 that there is no correlation between
height and weight in the population.
4) CHI2
This procedure gives a chi-square value, the associated degrees of freedom,
and a contingency table.
A chi-square is a measure of association between two variables with nominal
level data. Data is called nominal level when it is used only to designate
groups, not as scores or rankings. Zip-codes an example of nominal level
data. In our example, SEX and RACE are nominal level variables.
By looking at the association between SEX and RACE, we can determine whether
the ratio of males to females differs significantly according to race.
The run looks like this:
>?SEX
>?RACE
SEX RACE FREQ FR EXP
1 1 1 1.5
1 2 2 2
1 3 2 1.5
2 1 2 1.5
2 2 2 2
2 3 1 1.5
CHI2=0.667 DF=2 N.S.
The chi-square value which was derived is below the level needed for
significance at the .05 level. This is indicated by the notation "N.S."
which means "not significant. The above results indicate that the proportion
of males to females does not differ significantly among the three different
races in our sample. If there were a significant relationship,
instead of "N.S.," we would see "P<.05."
5) T:US
This procedure performs a t-test for unrelated samples. It reports the
mean, standard deviation, and N of each group, as well as the pooled
standard error, the t-value, degrees of freedom, and p or probability
value.
In this procedure, a criterion variable is split into two groups, and
the mean of the two groups is compared. Any of the other variables
can be used as grouping variables. The grouping variable is entered first,
then the criterion variable. Finally, two value ranges are specified for
the grouping variable. Subjects who fall into the first range are
designated as "LEVEL 1", those falling within the second range are
"LEVEL 2."
Suppose we would like to see if males are, on the average, different in
height from females. The run would look like this:
>?SEX
>?HEIGHT
L1,H1: 1,1
L2,H2: 2,2
LEVEL 1 LEVEL 2
MEAN 68.6 63.8
SDEV 1.02 2.315
N 5 5
--------------------------------
S.E.=1.265
T=3.795 DF=8 P=5E-03
SEX is the variable which defines the two groups, or levels, and so it
was entered first. HT. is the criterion variable, and was entered second.
In response to the "L1, H1:," we entered 1,1. Entering these ones indicated
that the first group of subjects
in which we are interested ranges from one to one, inclusive, on the
variable "SEX." This is all the males. The second group ranges from two
to two, and includes all of the females.
Looking at our results, we see that males are, on the average, 68.6 inches
tall, on just under 5'9". The females are just under 5'4". Our p-value
is 5E-03, which is 5 times 10 to the -3 power, or .005. Since this is
less than the conventional .05 level of significance, we can say that
males are taller than females in the population from which
our sample was drawn.
6) T:RS
This procedure performs a t-test on related samples. This test is also
called a matched-pairs or repeated-measures t-test. The procedure
provides a t-value, associated degrees of freedom, and p or
probability level. The t-value is positive if the first variable
entered has a large mean; it is negative if the second variable
has a large mean.
Related-samples means that the scores are expected to be correlated, and
can reasonably be analyzed in pairs. Such is the case when the same
subjects are exposed to two different experimental conditions, or when
some measure is taken before and after a certain treatment. By analyzing
the difference between pairs of scores instead of groups of scores, the test
becomes more sensitive, and significant results become more easily
obtained.
In these data, perhaps the most interesting question is whether the subjects
weighed significantly less after the diet than they did before it. The
run would look like this:
>? WT1
>? WT2
MEAN 135.5 131.9
SDEV 19.831 17.768
T=2.785 DF=9 P=.021
Since our p-value of .021 suggests that such results would be rare (obtained
only 21 out of 1000 times) with an ineffective diet, we can conclude that
the diet would be effective if used by others in the population from which
our sample was drawn.
7) ALPHA
Note: This function is not available in the public domain version of MINISTAT.
This procedure calculates coefficient alpha, a measure of the internal
consistency and reliability of a test. The procedure asks first for
number of items. Enter the number of test items for which you will be
assessing reliability. It then prompts you for variables, which are the
names of each test item. The output shows coefficient alpha, and the
correlation of each item with the sum of all other test items. This
allows you to judge which items are inconsistent with the rest of the
test, and which you should consider disgarding to increase the test's
reliability.
*****
UTILITIES
A) SAVE
This function allows you to save your current file to the disk, so that
you can reanalyze the data at a later time. It asks if you want to
change the file name, so if the file has been modified, the new file
can be saved without erasing the original file.
B) INFO
This command allows you to view information about the current file.
It will show the file name, number of observations (N), and the
variable names.
C) OLD
This command retrieves a file that had been previously created with
the E) NEW command. When using this command, if there is already a data
file in memory it will be erased. To prevent this from accidental file
loss, the program asks if you are sure that you want to load a new file.
D) DIR
This command lists the data files on the current disk. It will list
only data files created by this program.
E) NEW
This command allows you to create a new data file. See the section under
"SETTING UP THE FILE" in this document for more details.
F) KILL
This command can be used to delete any files that have been created by
MINISTAT. Enter the name of the file at the prompt, and you will either
be told that the file has been killed, or that the file can not be found.
G) COMP
This option allows you to transform a variable to create a new variable.
For instance, we might want to convert the WT1 variable in our DIET file
from pounds to kilograms. We can do this by multiplying by 2.2. When we
enter the COMP procedure, we are asked "CONSTANT OR VARIABLE?" Type
C for CONSTANT if you are going to transform your variable with a constant,
type V for VARIABLE if you are going to use another variable to transform
it. In our example, we will type C, since we are using a constant to
transform pounds to kilograms. Next, input the variable or constant
which will be used -- in our case, 2.2. The next step is to select the
operation, i.e., addition, multiplication, etc. We are going to
multiply, so we press "*". Then we enter the variable to be transformed,
WT1. The program shows us the transformation equation, 2.2*WT1. We
are then asked for the name of the new variable, the one we have created
by the transformation. We can either write over an old variable, or we
can create a new one. For this example, we will pick the name WT3.
The screen will look like this after the transformation:
VARIABLE OR CONSTANT? CONSTANT
? 2.2
SELECT OPERATION: *
>? WT1
2.2*WT1
NEW VARIABLE>? WT3
??
COMPUTATION COMPLETED
You can use the INFO procedure to reassure yourself that the new
variable is there.
H) HELP
Note: This procedure not available in public domain version.
HELP will bring you to a help menu very similar to the main menu.
Requesting any procedure while in the HELP area will give a single
screen describing how to use the procedure. The <at> key will return
you to the main menu.
*****
PRINTING
To print out a screen, first turn on your printer, and then
hit the <F1> key.
******************************************************
* *
* SHAREWARE MESSAGE *
* *
******************************************************
MINISTAT is a shareware program -- that is, try
it out first, and if you find it useful & expect
to continue using it, you pay a shareware fee --
a fee which is generally much lower than what you
would pay for equivalent commercially available
software. There are two options for meeting your
shareware obligation with MINISTAT:
1) Send $10.00 and you will receive aprinted
copy of this manual, along with a disk containing
an enhanced version of the program. This version
includes a full HELP menu and an additional
statistical procedure (see ALPHA above). As a
registered user, you will be notified of program
updates, which you will be able to receive if you
send me a blank disk and a self-addressed stamped
envelope.
2) Actually, the $10.00 above barely covers expenses,
and will certainly not make me rich. Like most shareware
authors (I think), the real satisfaction comes from knowing
that people are using and enjoying the software. So here is
option #2: your shareware obligation can be met completely
just by COMMENTING on the software. Send me a note by mail
or CIS e-mail to let me know how you are using the program,
what you like, and what you found confusing. You can also
send any recommendations for improvement. If you choose to
make a contribution of less than $10, you are welcome
to do that also, of course.
I suspect we all suffer from at least some "shareware guilt,"
from all those programs we have downloaded and not yet paid
for -- so with the two options above you can easily rid
yourself of some of this destructive emotion at low cost
or at no cost. Thanks for your interest -- have fun
with MINISTAT!
--Jon Rich, Ph.D.
23212-6 Orange Ave.
El Toro, CA 92630-6918
CIS 73367,1326